Method and allocation device for allocating pending requests for data packet transmission at a number of inputs to a number of outputs of a packet switching device in successive time slots

ABSTRACT

A method for allocating pending requests for data packet transmission at a number of inputs to a number of outputs of a switching system in successive time slots, including a matching method including the steps of providing a first request information in a first time slot indicating data packets at the inputs requesting transmission to the outputs of the switching system, performing a first step in the first time slot depending on the first request information to obtain a first matching information, providing a last request information in a last time slot successive to the first time slot, performing a last step in the last time slot depending on the last request information and depending on the first matching information to obtain a final matching information, and assigning the pending data packets at the number of inputs to the number of outputs based on the final matching information.

FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

This invention was made with Government support under Contract No.:B527064 awarded by the Department of Energy. The Government has certainrights in this invention.

FIELD OF THE INVENTION

The present invention relates to the field of packet switching,specifically to the field of crossbar-based packet-switch architectures.

BACKGROUND OF THE INVENTION

Many packet switching devices are based on an input-queued architecture,comprising queues sorted per output (virtual output queues—VOQ) at everyinput line card, a crossbar routing fabric, and a central arbitrationunit that computes which input is allowed to send to which output inevery time slot. A time slot as herein understood equals the duration ofone fixed size packet.

Typically, the arbitration unit is physically located close to thecrossbar. In such a system, a data path is provided which comprises theflow of data packets from input line cards through the crossbar to theoutput line cards, and a control path, which comprises the flow ofcontrol information from the line cards to the arbiter, i.e. the requestinformation, and back to the line cards, i.e. the grant information.

To obtain good performance, the arbitration unit should compute amatching between the inputs and outputs in every successive time slot,wherein a set of data packets from the inputs is forwarded to therespective outputs. An optimum solution for the matching is too complexto be implemented in fast hardware. Instead, heuristic iterativealgorithms such as PIM, i-SLIP or DRRM are commonly used. The quality oftheir matching solution strongly depends on the number of iterations orsteps that can be carried out in the available arbitration time,commonly in one time slot. In general, O(log(N)) iterations or steps arerequired for adequate performance, although in the worst case thesealgorithms only converge in N iterations, where N is the number ofports.

As line rates continue to increase but cell sizes remain constant, theavailable arbitration time is shrinking, making it harder to completeenough iterations or steps to achieve an optimized matching solution.The arbitration in general requires a number of iterations (depending onthe number of ports N) that may not be feasible to complete during onetime slot.

One solution to this problem is to parallelize or load balance thematching process over multiple allocation units, as proposed by Oki etal. “Pipelined-based approach for maximal size matching scheduling ininput-buffered switches”, IEEE Communication Letters, Vol. 5, No. 6,June 2001, pp. 363-365. To obtain one arbitration decision at every cellcycle, a number of identical parallel subschedulers are employed, eachof them performing several iterations to perform the matching. Onedrawback of this solution is that the subschedulers in any case need apredetermined time until all iterations are performed before returning amatching result even if the matching was produced in the firstiteration. This produces a latency which is determined by thepredetermined number of time slots used for the iteration which cannotbe reduced any further.

Another solution to the same problem is to pipeline the matching processas proposed by Nabeshima “Input-Queued Switches Using two Schedulers inParallel”, IEICE Transactions on Communication, Vol. E85-B, No. 2,February 2002, pp. 523-531. To obtain one arbitration decision in everytime slot, the matching process is overlapped over multiplesubschedulers arranged in a sequential pipeline setup, each of themperforming one or more iterations to perform optimize the matching. Themain drawback of this scheme is again the minimum latency which equalsthe sum of latencies of all the subschedulers.

It is therefore an object of the present invention to provide a methodand an allocation device for allocating pending requests for thetransmission of data packets at a number of inputs to a number ofoutputs of a packet switching device according to their destination,wherein the latency of the arbitration is minimized.

It is a further object of the present invention to provide a highthroughput close or equal to the maximum achievable throughput and lowerlatency at low utilization relative to the existing schemes.

It is another object of the present invention to provide a method whichmay be combined with any of the known matching algorithms commonly used.

SUMMARY OF THE INVENTION

According to a first aspect of the present invention a method forallocating pending requests for data packet transmission at a number ofinputs to a number of outputs of a packet switching device in successivetime slots is provided. According to a matching method the allocation ofthe pending requests is optimized, wherein the matching method includesa number of steps for incrementally allocating the requests. As a resultof each step matching information is provided. In each time slot requestinformation is provided, the request information indicating the datapackets at the inputs requesting transmission to respective outputs. Afirst request information is provided in a first time slot and a firststep of the matching method is performed in the first time slotdepending on the first request information to obtain a first matchinginformation. A last request information is provided in a last time slotsucceeding the first time slot and a last step is performed in the lasttime slot depending on the last request information and depending on thefirst matching information to receive a final matching information. Thepending data packets are assigned at the number of inputs to the numberof outputs in dependence on the final matching information.

It can be provided that the matching method is performed in a first anda second thread, which are shifted, so that the first step of the secondthread and the second step of the first thread are performed in the sametime slot. Thus, different threads of the matching method are performedin each of the time slots in parallel to generate the respective finalmatching information in every time slot.

According to one embodiment of the present invention between the firststep and the last step of the matching method a number of intermediatesteps are performed in successive intermediate time slots between thefirst time slot and the last time slot. Respective intermediate requestinformation is provided in the respective intermediate time slot,wherein each of the steps provides intermediate matching information toa successive intermediate step depending on intermediate matchinginformation from the preceding intermediate step and depending onrequest information of the respective intermediate time slot. The firststep provides the first matching information to the first of theintermediate steps, and the last step receives the intermediate matchinginformation from the last of the intermediate steps.

According to another embodiment, at least one of the intermediate stepsor the last step is performed by modifying the respective intermediateor last request information depending on the respective first orintermediate matching information provided by the preceding step,wherein performing the one step depends on the modified respectiverequest information to obtain a partial matching information.

According to another embodiment the one step of the matching methodincludes the merging of the intermediate or first matching informationprovided by the preceding step and the partial matching information fromthe current step to obtain the respective intermediate or final matchinginformation.

According to another embodiment, the partial matching information ismodified depending on the matching information provided by any of thesteps, the partial matching information of any of the steps, the pendingrequest information, and/or position information indicating the positionof the respective step within the steps of the matching method.

According to another embodiment, each of the first, intermediate andlast request information depends on the number of pending requests ateach of the inputs with respect to each of the outputs.

According to another embodiment, the request information is selectivelyprovided to the first, intermediate and last steps depending on thematching information provided by any of the steps of the matchingmethod, the current number of pending requests of each input relative toeach of the outputs, and/or a position information indicating theposition of the respective step within the steps of the matching method.

According to another aspect of the present invention an allocationdevice for allocating pending requests for data packet transmission at anumber of inputs to a number of outputs of a packet switching device insuccessive time slots is provided. The allocating of the pendingrequests is performed or optimized by a matching method, wherein thematching method includes a number of steps for incrementally allocatingthe requests to optimize the allocation of the data packets. It providesa first allocation stage for performing a first step of the matchingmethod in a first time slot depending on first request informationprovided in the first time slot to receive first matching information.It is further provided a last allocation stage for performing a laststep of the matching method in a last time slot depending on lastrequest information provided in the last time slot and depending on thefirst matching information to receive final matching information. Therespective provided request information indicates the data packets atthe inputs requesting transmission to the respective outputs. By meansof an allocation unit the pending data packets at the number of inputsto the number of outputs is allocated depending on the final matchinginformation.

According to one embodiment, the allocation device further comprises oneor more intermediate allocation stages which are located between thefirst allocation stage and the last allocation stage and are connectedin series with each other and with the first and the last allocationstage, and for performing a number of intermediate steps of the matchingmethod in successive intermediate time slots between the first time slotand the last time slot. Each of the allocation stages providesintermediate matching information to a successive intermediateallocation stage, wherein the intermediate matching informationdepending on intermediate matching information received from thepreceding intermediate allocation stage and depending on providedintermediate request information of the respective intermediate timeslot. The first allocation stage provides the first matching informationto the first of the intermediate allocation stages, and the lastallocation stage receives the intermediate matching information from thelast of the intermediate allocation stages.

According to another embodiment of the present invention, at least oneof the allocation stages comprises a prefilter for modifying therespective intermediate and last request information depending on therespective first and intermediate matching information provided by thepreceding allocation stage. The one allocation stage further comprisesan allocator for performing the step of the matching method of therespective allocation stage depending on the filtered respective requestinformation to obtain partial matching information.

According to another embodiment of the present invention, the oneallocation stage further comprises a merging unit for merging the firstor intermediate matching information provided by the precedingallocation stage and the partial matching information to obtain therespective intermediate or final matching information.

According to another embodiment of the present invention, at least oneof the allocation stages further comprises a post-filter unit formodifying the partial matching information depending on the matchinginformation provided by any of the allocation stages, the partialmatching information of any of the allocation stages, the pendingrequest information in the respective time slot, and/or a positioninformation indicating the position of the respective allocation stagewithin the series of allocation stages.

According to another embodiment of the present invention, the allocationdevice further comprises a request counter unit to provide the first,intermediate and last request information depending on the number ofpending requests at each of the inputs with respect to each of theoutputs in the respective first, intermediate and last time slot.

According to another embodiment of the present invention, the allocationdevice further comprises a selection unit to selectively provide therequest information to the first, intermediate and last allocation stagedepending on the matching information obtained by any step of thematching method, the current number of pending requests of each inputrelative to each of the outputs; and/or a position informationindicating position of the respective step within the steps of thematching method.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the present invention are discussed below inmore detail together with the accompanying drawings, wherein:

FIG. 1 shows a schematic diagram of a packet switching device comprisingan allocation device according to one embodiment of the presentinvention;

FIG. 2 shows a part of the packet switching device of FIG. 1, depictingthe control between one input/output line card and the arbitration unit;

FIG. 3 illustrates the matching problem in allocation devices employedin packet switching devices with several inputs and several outputs;

FIG. 4 illustrates the iterative steps to come to an optimized matchingbetween the inputs and the outputs;

FIGS. 5 a and 5 b illustrate the forming of the matching methodaccording to two methods of the prior art;

FIG. 5 c illustrates the forming of the matching method according thepresent invention;

FIG. 5 d is an explanatory illustration of the symbols and conventionsused in FIGS. 5 a, 5 b and 5 c.

FIG. 6 illustrates a schematic diagram of a preferred embodiment of anarbitration unit in a packet switching device according to a preferredembodiment of the present invention;

FIG. 7 illustrates a more detailed diagram of a prefilter unit as shownin the embodiment of FIG. 6; and

FIG. 8 illustrates a more detailed diagram of a postfilter unit as shownin the embodiment of FIG. 6.

DESCRIPTION OF PREFERRED EMBODIMENTS

In FIG. 1 a schematic block diagram of a packet switching device isdepicted. The packet switching device comprises N bidirectionalfull-duplex ingress/egress data links 1 that are connected to N linecards 2. Data packets to be transmitted comprise a payload and headerinformation indicating the requested packet destination and aretransmitted and received over the data links 1. Each of the line cards 2provides one or more data inputs and one or more data outputs and isconnected to a switching unit 10 via a bidirectional full-duplex datalink 3. The switching unit 10 comprises a routing fabric 5 and anarbitration unit 6. The routing fabric 5, typically a crossbar,comprises N input and N output ports. It can also be possible to providecrossbars having a different number of inputs and outputs.

Each line card 2 is also connected to the arbitration unit 6 with adedicated bidirectional control link 4, which is used to exchangecontrol messages between the line cards 2 and the arbitration unit 6.The arbitration unit 6 is connected to the crossbar 5 through aconfiguration link 7.

Each of the line cards 2 comprises a plurality of ingress queues 21 forbuffering incoming data packets and an egress queue 22 for bufferingoutgoing data packets. The ingress queues 21 are designed as virtualoutput queues (VOQ), each dedicated to a specific output, wherein everyingress queue 21 stores data packets destined to the one specificassigned output port.

The crossbar 5 of the switching unit 10 is designed such that at anytime an input can only be connected to one output and vice versa, i.e.,there is a one-to-one matching between inputs and outputs. To obtain agood performance of-the packet switching device in terms of latency andthroughput, this matching is typically computed by the arbitration unit6. The arbitration unit 6 receives requests from the line cards 2 wherea request comprises an output port identifier meaning that the line card2 that originated the request wishes to transmit a data packet to theoutput port identified by the output port identifier.

Based on the requests received from all line cards 2, the arbitrationunit 6 computes a suitable one-to-one matching between input and outputports for the current time slot.

Based on the computed matching, the arbitration unit 6 then returns thecorresponding grant information to the line cards 2. The grantinformation comprises an output port identifier meaning that the linecard 2 receiving this grant information is allowed to transmit a packetto this specific output port. When a line card 2 receives grantinformation, it dequeues one packet from the corresponding ingress queue21 and transmits it on the data link 3 to the crossbar 5. The crossbar 5routes the incoming packets to the data links 3 according to theconfiguration determined by the matching computed by the arbitrationunit 6 and applied to the crossbar 5 via the configuration link 7.

The arbitration unit 6 implements an algorithm to compute a one-to-onematching between the inputs and the outputs. The optimum solution tothis problem is known to be too complex to implement in fast hardware.Therefore, a number of heuristic algorithms have been proposed, e.g.i-SLIP. Many of these heuristic algorithms are iterative, i.e. theyrepeat a given set of steps for a number of times. Each step improvesthe matching obtained in the previous step until either no additionalimprovement is found or a predetermined number of steps have beenexecuted. However, existing matches cannot be undone in subsequentsteps.

As the packet size is typically fixed, the system is operated in atime-slotted fashion defining time slots, each time slot being equal tothe duration of one packet. For maximum efficiency, the arbitration unitshould provide one matching in every time slot. Therefore, the amount oftime available to compute a matching is given by a minimum packetduration T_(c). The limits of the physical implementation determine howfast a single iteration of the matching algorithm can be executed; thistime is denoted by T_(i). The number of iterations is typically fixed toa given number M. The time required for one matching then equalsT_(M)=M*T_(i).

In FIG. 2, the architecture of the arbitration unit 6 and the line cards2 is depicted more detailed. The arbitration unit 6 is connected via thecontrol links 4 with each of the line cards 2.

The line cards 2 comprise the ingress queues 21 to store incoming datapackets and egress queues 22 to store outgoing data packets. Incomingdata packets are received by an enqueueing unit 25, which assigns anincoming data packet to the respective ingress queue 21, depending ontowhich output the incoming data packet should be delivered. The queueoccupancy information is provided to a control message unit 29, which isconnected to the control link 4. The control message unit 29 generatescontrol messages comprising requests to be transmitted to thearbitration unit 6, indicating the status of the ingress queues 21wherein the information about the outputs the data packets in theingress queues 21 are pending for is included. The line card 2 alsocomprises a dequeueing unit 26 which receives a control messagecomprising a grant information transmitted via the control link 4,indicating within each time slot which of the ingress queues 21 isallowed to transmit a data packet to the respective output.

The arbitration unit 6 receives control messages comprising the requestinformation and generates the control messages comprising the grantinformation while setting the crossbar 5 so that the waiting data packetselected by the grant information is transmitted via the crossbar to therespective line card to output the data packet.

The arbitration unit 6 comprises a request counter unit 61 wherein thepending requests generated by all of the connected line cards 2 arecollected and buffered. The request counter unit 61 generates requestinformation which is transmitted to an allocation unit 62 which performsa matching method to optimize the matching between the inputs and theoutputs of the packet switching device. As a result of the matchingmethod, the allocation unit 62 controls the crossbar 5 via theconfiguration link 7 and provides a respective grant information to thedequeuing unit 26 for each of the connected line cards 2.

The grant information signals to the respective line card 2 the datapacket of which queue is to be transmitted next via the crossbar 5 tothe respective output. The generation of the configuration signals viathe configuration link 7 to the crossbar 5 and the generation of thegrant information and transmitting it to the respective line cards 2 isdesigned such that the selected data packet from the line card 2 arrivesat the crossbar 5 when the crossbar 5 is switched so that the datapacket can be forwarded to the respective output.

In the FIGS. 3 a)-d) and 4 a) and b), a matching problem is discussedexisting in the configuration of a packet switching device having anumber of inputs and a number of outputs wherein a set of one-to-oneinterconnections between a set of inputs and a set of outputs should beestablished to forward data packets through the packet switching device.

In FIG. 3 a the ingress queues 21 of three line cards 2 each having oneinput and one output are depicted schematically. Each of the line cards2 comprises three ingress queues, one for each possible output of theexemplary allocation device.

The filled boxes of the ingress queues 21 represent data packets waitingto be forwarded to an output associated with the respective ingressqueue. In the ingress queues 21 of the upper first line card 2, a datapacket in the first ingress queue has to be transmitted to the firstoutput, two data packets in the second ingress queue 21 have to betransmitted to a second output, and one data packet in the third ingressqueue has to be transmitted to a third output. In the ingress queues 21of the second line card 2 three data packets in the first ingress queuehave to be transmitted to the first output, in the third line card 2 onedata packet in the second ingress queue has to be transmitted to thesecond output, and two data packets in the third ingress queue have tobe transmitted to the third output.

Given the filling state of the ingress queues 21, a bipartite graph, asshown in FIG. 3 b can be depicted indicating all of the requesting datalinks between input and outputs. The matching method now tries tooptimise the configuration of the one-to-one interconnections so that asmany data packets as possible can be forwarded to the respective outputsat each time slot.

As it is shown in FIG. 3 c, the optimisation problem really exists asthere is also a non-optimum solution in which only the matching of twoinputs to two outputs exists while one of the inputs and one of theoutputs can not be used in this time slot. The matching given in FIG. 3d connects the three inputs to the three outputs representing theoptimized matching and which should be achieved by the matching methoditeratively performed.

The matching method is normally performed in a number of steps iteratingthe matching solution. This is depicted in FIGS. 4 a and b for the caseof an iterative 3-phase matching algorithm such as i-SLIP, wherein,beginning with the bipartite graph shown in FIG. 3 b which isrepresented by the request information stored in the request counterunit 61, one interconnection between one input and one output isselected which in the given example is an interconnection between thefirst input and the third output. As the matching result of the firstiteration is fixed in a second iteration step, only a limited number ofmatching possibilities exist. According to the request informationprovided by the request counter 61, a matching is possible wherein thesecond input is connected to the first output and the third input isconnected to the second output. Therefore these interconnections will beadded to the matching solution in the second iteration.

The number of iterations required is generally given by log₂(N),however, the matching result may be optimised by a smaller number ofiterations.

In FIGS. 5 a, 5 b and 5 c, the timing diagrams of three matching schemesare depicted and compared based on a sequence of four requests r_(S0),r_(S1), r_(S2), r_(S3), received at successive time slots S0, S1, S2 andS3. All three figures use the representation convention and symbols asillustrated in the legend of FIG. 5 d, i.e.:

-   -   the x-axis indicates time slots from S0 to S7,    -   the y-axis indicates allocation unit identifiers k, where k=1 to        4,    -   the grey boxes represent the boundary request and grant        conditions of four allocation units 1 to 4 at successive time        slots S0 to S7 wherein,        -   the ingress left arrows indicate pending requests r_(S0) to            r_(S3) received at time slots S0 to S3,        -   the egress bottom arrows indicate matching grants g_(S1) to            g_(S7) generated at time slots S1 to S7.

The time to complete one matching iteration is denoted by T_(i), therequired number of iterations per arbitration by M, and the time slot byT_(c). The arbitration time T_(M) is then T_(M)=M*T_(i). If T_(M)>T_(c),paralleling, load balancing or pipelining is used to maintainefficiency. In the example shown here, T_(c)/T_(i)=2 and M=8.

In FIG. 5 a, the matching is performed by parallel allocation unitswhich are independent of each other. To obtain one arbitration decisionat every time slot, K=4 identical parallel units are employed, whereK=T_(M)/T_(c). However, the cell latency in the absence ofcontention—the absolute minimum latency—in this scheme is equal toK*T_(c), because the allocation unit waits for all iterations tocomplete before returning a matching, even if this matching was producedin the first iteration.

In FIG. 5 b, the matching is performed by K=4 pipelined allocation unitswhere K=M*T_(i)/T_(c) and where each unit executes I=T_(c)/T_(i)iterations before passing its matching result (denoted as grants in thefigure) on to the next unit of the pipeline. This pipelining schemeincurs the same cell latency penalty of K*T_(c) as the parallel scheme,because the final matching cannot be delivered before all iterations areexecuted in sequence, even if this matching was produced in the firstiteration.

In FIG. 5 c, the matching method according to the present invention isdepicted. The presented method also comprises K=4 pipelined allocationunits where a matching result is sequentially passed on to the nextpipelined unit. However, here the scheme provides a paralleldistribution of the requests to all the allocation units, which enablesany of these units to shortcut the normal sequence of pipelinediterations and reduces the absolute minimum latency down to a singletime slot (T_(c)). The presented matching method as indicated in FIG. 5c produces a final matching g_(S1) in time slot S1 in response to therequest r_(S0) received at time slot S0. Thus the latency is reduced toa single time slot (T_(c)). This latency is to be compared with both theparallel scheme of FIG. 5 a and the pipelined scheme of FIG. 5 b wherethe final matching g_(S4) produced in response to the received requestr_(S0) occurs at time slot S4.

In FIG. 6, a more detailed schematic diagram of the arbitration unit 6is depicted. Particularly, the allocation unit 62 is shown in moredetail illustrating the method for allocating pending requests for thetransmission of data packets according to an embodiment of the presentinvention.

The matching unit 62 comprises a number of allocation stages 63 eachhaving an allocator 66, wherein the number of the allocation stages 63is equal or greater than the next integer equal or greater than thearbitration time T_(M) divided by the time slot time T_(c). Theallocators 66 provide a matching function for optimizing the performanceof the matching.

In each of the allocation stages 63, a prefilter unit 65 is providedincluding a number of prefilter means (not shown in detail) specificallyone for every ingress queue. The output of each of the allocators 66 isconnected to a postfilter unit 67 each including a number of postfiltermeans (not shown in detail) specifically one for every ingress queue.

The allocation stages 63 are connected in series so that an output ofthe postfilter units 67 associated with one allocator 66 is connected toan input of the postfilter unit 67 and/or an input of the prefilter unit65 of the successive allocation stage 63. The output of the postfilterunit 67 of the last allocator 66 of the series is connected to a grantcoding unit 68 which generates grant information supplied to the controlmessage units 64 and supplied to the crossbar 5 via the configurationlinks 7. In the configuration links 7 delay units 71 are provided whichsynchronize the switching in the crossbar 5 and the forwarding of therespective data packets to the determined outputs.

When the requests arrive via the control links 4, the request counterunit 61 decodes the requests and updates the status information of thecorresponding ingress queues 21. The request counter unit 61 comprises aplurality of single counters, specifically one for every ingress queue21 for each of the connected line cards 2. The request counter unit 61generates request information in every time slot.

When a new request for a specific ingress queue 21 arrives, thecorresponding counter is incremented. Each of the post-filter units 67is connected to a grant collecting unit 69 which counts the new grantsfor a specific ingress queue 21. When a new matching is obtained for aspecific ingress queue 21, the corresponding counter is decrementedaccording to the grant counting unit 69. In this manner, the requestcounter unit 61 represents the number of pending requests for thecorresponding ingress queues 21 of each of the line cards 2. The requestinformation generated by the request counter 61 is sent to any of theprefilter units 65 of each of the allocation stages. Every prefilterunit 65 can forward the request information to the respective allocator66 of the respective allocation stage or can modify the requestinformation according to rules stated below. The decision whether tomodify the request information before forwarding it to the correspondingallocator 66 is based on a predetermined rule.

In a preferred embodiment, this decision is based on the currentmatching of the corresponding allocator 66, the value of thecorresponding counter 61 and/or the position of the allocation stage 63in the series.

Every allocator 66 receives request information for zero, one or more ofthe ingress queues 21. It computes a matching according to some matchingalgorithm which is known from prior art and will not be discussedfurther herein. The matching method can be iterative and optimizes theconfiguration of the one-to-one interconnections between the inputs andthe outputs of the packet switching device. If an iterative matchingmethod is employed, each allocator 66 is designed to perform one or moreiterations on the given request information to forward an intermediatematching result to the corresponding postfilter unit 67 even if thisintermediate matching result has not led to the final optimizedsolution. However, the scope of the present invention is not limited toiterative matching methods.

After each time slot, each of the allocators 66 of every allocationstage 63 outputs the respective partial matching result to thecorresponding postfilter unit 67. Every postfilter unit 67 decideswhether to modify the received partial matching. In a preferredembodiment, this decision of the filtering units is based on thematching of any allocation stage 63, the newly added matchings of thecorresponding allocator 66 or other allocators 66, the status of therequest counter 61 and/or the position of the allocation stage 63 in theseries.

The postfilter unit 67 merges the filtered partial information with thefirst or intermediate matching information received from the precedingallocation stage 63 and forwards the merged matching information to thenext allocation stage 63 in the pipeline.

The request counter unit 61 stores information on the pending requeststhat means all requests which are not matched yet. As in all of theallocation stages 63 the generated input-output pairing given by thematching information cannot be removed by successive allocation stages63 this input-output-pairing related to a pending data packet at therespective input decreases the number of pending requests for therespective input-output-pair by one. As in all of the allocation stages63 the matching is performed simultaneously and provided at the outputsof the allocators 66 the matching information is collected in a grantcounting unit 69 which is connected with an output of each of thepostfilter units 67 and to the request counting unit 61 to control therequest counting unit 61 to decrease the number of the pending requestsby the number of newly added matchings per input-output-pair in therespective time slot.

One main function of the post-filter units 67 is to optimize theperformance, particularly in order to prevent too many grants from beingissued.

The grant coding unit 68 receives the matching result in the form of amatrix indicating the input-output pairing which is selected by thematching algorithm and generates the control message to compress thematching information to a smaller control message format.

In FIG. 7, a more detailed diagram of one exemplary prefilter unit 65 isdepicted. The prefilter unit 65 comprises a match filter 650 to filterout requests for inputs and outputs that have already been matched givenby the matching information of the previous allocation stage 63. Itfurther comprises a request filter 651 to implement a request filterfunction deciding whether the request information is to be applied intothe respective allocation stage 63 or not. The request filter 651 can beused to optimize the performance of the arbitration unit 6 bycontrolling the flow of the request information output by the requestcounting unit 61. The request filter 651 is optional.

The prefilter unit 65 further comprises an AND gate 652 an output ofwhich is connected to the allocator 66. The output of the match filter650 is connected to an inverted input of the AND gate 652. Another inputof the AND gate 652 is connected with one output of the request filter651. The resulting output of the AND gate 652 indicates if a requestsent by the request counting unit 61 should be considered in theallocator 66 of respective allocation stage 63 or not.

In FIG. 8, a more detailed diagram of one exemplary post-filter unit 67is depicted. The postfilter unit 67 comprises in the given example afiltering unit 670 performing a post-filtering function depending on thegrants and optionally certain other variables. The postfilteringfunction filters out the one or more grants to be removed. The filteringunit 670 receives as one input the grant information as a result of theassociated allocator 66. It is provided in the shown example that oneoutput of the postfiltering unit 670 is false indicating one or moregrants to be removed from the result of the respective allocator 66associated to the post-filter unit 67. The filtering unit 670 isoptional and therefore can be omitted in other embodiments.

The postfilter unit 67 further comprises a AND gate 671 at the inputs ofwhich the grant information and the filtering decision is applied toperform the actual filtering. Furthermore, an OR gate 672 is providedhaving as inputs the filtered grant information received from the outputof the second AND gate 671 and the matching result of the previousallocation stage 63 and merges the provided information to the matchinginformation of the current allocation stage 63

1. A method of transmitting packets from devices to output ports, themethod comprising: providing a plurality of requests to transmit datapackets from a plurality of devices, wherein each request corresponds toone of a plurality of input queues of one of the devices and includes anoutput port identifier for transmitting data packets to one of aplurality of output ports; receiving the requests in parallel atrespective inputs of a plurality of sequential allocation stages,wherein an output of each stage is connected to an input of a subsequentstage; all of the allocation stages performing a matching based on therequests to generate partial matching information, wherein the partialmatching information is a matching of less than all the devices to acorresponding one of the output ports; transferring, by each of theallocation stages, respective partial matching information to asubsequent allocation stage in the plurality of sequential allocationstages; excluding, by a sequential subset of the allocation stages, atleast a first sequential allocation stage, and the subset performing amatching based on the requests and the respective partial matchinginformation to generate complete matching information, wherein thecomplete matching information is a matching of all the devices to acorresponding one of the output ports; and granting permission to aninput queue of each of the devices for a corresponding one of the outputports using the completed matching information from a last stage of theplurality of sequential allocation stages, wherein each matching isbased on the same devices and output ports.
 2. The method of claim 1,further comprising transferring the partial matching information from acurrent stage of the plurality of allocation stages to a subsequentstage of the plurality of allocation stages.
 3. The method of claim 2,wherein the transferring of the partial matching information from astage of the plurality to a subsequent stage of the plurality is basedon a number of the requests that are pending.
 4. The method of claim 2,wherein the transferring of the partial matching information from acurrent stage of the plurality of allocation stages to a subsequentstage of the plurality of allocation stages is based on a position ofthe current stage with the plurality of allocation stages.
 5. The methodof claim 1, further comprising transmitting the data packets from eachof the input queues that were granted permission to a corresponding oneof the output ports.
 6. An arbitration unit comprising: a plurality ofsequential allocation stages, wherein an output of each stage isconnected to an input of a subsequent stage; a request unit providingrequests to transmit data packets from a plurality of input devices inparallel to an input of each of the stages, wherein each requestincludes an output port identifier for transmitting data packets to oneof a plurality of output ports; a grant unit providing final matchinginformation from a last stage of the plurality of allocation stages tothe input devices, wherein each stage is configured to perform a firstmatching based on the requests to generate partial matching informationduring a first period, the first matching based on the same inputdevices and output ports, wherein less than all of the stages areconfigured to perform a second matching based on the requests and thepartial matching information to generate final matching informationduring a second period, the second matching based on the same inputdevices and output ports, wherein the final matching information is amatching of all the requesting devices to a corresponding one of theoutput ports.
 7. The arbitration unit of claim 6, wherein each of thestages are configured to perform the matching iteratively based on thereceived requests and the partial matching information from a precedingone of the stages.
 8. The arbitration unit of claim 7, wherein at leastone of the allocation stages comprises: an allocator to perform thematching; and a prefilter to perform one of a forwarding of the requeststo the allocator or a forwarding of modified information to theallocator, wherein the modified information is based on the requests andthe partial matching information from a preceding stage.
 9. Thearbitration unit of claim 8, wherein prefilter determines whether toforward the modified information based on a current matching in thepartial matching information from the preceding stage.
 10. Thearbitration unit of claim 8, wherein the prefilter determines whether toforward the modified information based on a number of the requests thatare pending.
 11. The arbitration unit of claim 8, wherein the prefilterdetermines whether to forward the modified information based on aposition of the corresponding allocation stage within the plurality ofallocation stages.
 12. The arbitration unit of claim 8, wherein at leastone of the allocation stages further comprises a postfilter unit forfiltering out at least one match in the matching information.
 13. Thearbitration unit of claim 6, wherein each allocation stage includes anallocation unit to perform the first and second matchings.
 14. Thearbitration unit of claim 6, wherein the request unit comprises aplurality of counters, wherein each counter corresponds to one of theinput ports for counting a number of the requests that are pending for aparticular output port.
 15. The arbitration unit of claim 6, furthercomprises a selection unit to selectively provide the requests inparallel to each of the allocation stages.
 16. A method of schedulingpacket transmissions from input ports of a switching system to outputports of said switching system, the method comprising: 1) operating inparallel a plurality of allocation stages to compute a plurality ofmatching informations over the course of a plurality of successive timeslots; a) one of the matching informations being a final matchinginformation and the others being intermediate matching informations,wherein a final matching information is a matching computed over thecourse of all the successive time slots and an intermediate matchinginformation is a matching computed over the course of less than all thesuccessive time slots; 2) performing in each time slot the followingsteps: a) providing a plurality of requests to transmit data packetsfrom a plurality of input ports, wherein each request corresponds to oneof the input ports and includes an output port identifier fortransmitting data packets to one of a plurality of output ports; b)receiving the requests in parallel at respective inputs of theallocation stages, one of the allocation stages generating a finalmatching information based on a preceding intermediate matchinginformation and the requests received; c) the other allocation stageseach generating a new intermediate matching information based onpreceding intermediate matching informations and the requests received;and d) granting permission to the requesting input ports for acorresponding one of the output ports according to the final matchinginformation, wherein the generating of the intermediate matching and thefinal matching information are based on the same inputs ports and outputports.
 17. The method of claim 16, further comprising transferring eachintermediate matching information from a current one of the allocationstages to a subsequent stage one of the allocation stages such that thefinal matching information is obtained from a last stage of theallocation stages in each subsequent time slot.
 18. The method of claim17, further comprising transferring the intermediate matchinginformation from each stage to a subsequent stage.
 19. The method ofclaim 16, further comprising transmitting the data packets from each ofthe input ports that were granted permission to a corresponding one ofthe output ports.